Physical Biology
○ IOP Publishing
Preprints posted in the last 7 days, ranked by how well they match Physical Biology's content profile, based on 43 papers previously published here. The average preprint has a 0.08% match score for this journal, so anything above that is already an above-average fit.
Ballatore, F.; Madzvamuse, A.; Jebane, C.; Helfer, E.; Allena, R.
Show abstract
Understanding how cells migrate through confined environments is crucial for elucidating fundamental biological processes, including cancer invasion, immune surveillance, and tissue morphogenesis. The nucleus, as the largest and stiffest cellular organelle, often limits cellular deformability, making it a key factor in migration through narrow pores or highly constrained spaces. In this work, we introduce a geometric surface partial differential equation (GS-PDE) model in which the cell plasma membrane and nuclear envelope are described as evolving energetic closed surfaces governed by force-balance equations. We replicate the results of a biophysical experiment, where a microfluidic device is used to impose compressive stresses on cells by driving them through narrow microchannels under a controlled pressure gradient. The model is validated by reproducing cell entry into the microchannels. A parametric sensitivity analysis highlights the dominant influence of specific parameters, whose accurate estimation is essential for faithfully capturing the experimental setup. We found that surface tension and confinement geometry emerge as key determinants of translocation efficiency. Although tailored to this specific setup for validation purposes, the framework is sufficiently general to be applied to a broad range of cell mechanics scenarios, providing a robust and flexible tool for investigating the interplay between cell mechanics and confinement. It also offers a solid foundation for future extensions integrating more complex biochemical processes such as active confined migration.
Yang, F.; Hanks, E. M.; Conway, J. M.; Bjornstad, O. N.; Thanh, N. T. L.; Boni, M. F.; Servadio, J. L.
Show abstract
Infectious disease surveillance systems in tropical countries show that respiratory disease incidence generally manifests as year-round activity with weak fluctuations and irregular seasonality. Previously, using a ten-year time series of influenza-like illness (ILI) collected from outpatient clinics in Ho Chi Minh City (HCMC), Vietnam, we found a combination of nonannual and annual signals driving these dynamics, but with unknown mechanisms. In this study, we use seven stochastic dynamical models incorporating humidity, temperature, and school term to investigate plausible mechanisms behind these annual and nonannual incidence trends. We use iterated filtering to fit the models and evaluate the models by comparing how well they replicate the combination of annual and nonannual signals. We find that a model including specific humidity, temperature, and school term best fits our observed data from HCMC and partially reproduces the irregular seasonality. The estimated effects from specific humidity and temperature on transmission are nonlinearly negative but weak. School dismissal is associated with decreased transmission, but also with low magnitude. Under these weak external drivers, we hypothesize that stochasticity makes a strong sub-annual cycle more likely to be observed in ILI disease dynamics. Our study shows a possible mechanism for respiratory disease dynamics in the tropics. When the external drivers are weak, the seasonality of respiratory disease dynamics is prone to the influence of stochasticity.
Smah, M. L.; Seale, A.; Rock, K.
Show abstract
Infectious disease dynamics are strongly shaped by human mobility, social structure, and heterogeneous contact patterns, yet many epidemic models do not jointly capture these features. This study develops a spatial metapopulation epidemic model incorporating recurrent group-switch interactions to represent real-world transmission processes. Building on the Movement-Interaction-Return framework, the model integrates household structure, age-stratified contacts, and mobility between locations within a single SEIR framework. Using UK demographic, mobility, and social contact data, the model quantifies how within- and between-group interactions, mobility rates, and location connectivity influence epidemic spread. Both deterministic and stochastic simulations are implemented to analyse outbreak dynamics, variability, and fade-out probabilities for COVID-19-like and Ebola-like infections. Results shows that highly connected locations drive faster transmission, earlier epidemic peaks, and greater difficulty in containment, whereas larger but less connected locations tend to produce slower, more localised outbreaks despite their population size. Comparative analysis reveals that COVID-19-like infections spread rapidly and remain difficult to control even under interventions, while Ebola-like infections exhibit slower dynamics and are more effectively contained, particularly under targeted measures. Non-pharmaceutical interventions, particularly widespread closures, substantially reduce infections, hospitalisations, and deaths, although effectiveness depends on timing and pathogen characteristics. These findings highlight the importance of integrating mobility, clustering, and demographic heterogeneity to inform targeted and effective epidemic control strategies.
Smah, M. L.; Seale, A. C.; Rock, K. S.
Show abstract
Network-based epidemic models have been instrumental in understanding how contact structure shapes infectious disease dynamics, yet widely used frameworks such as Erd[o]s-Renyi, configuration-model, and stochastic block networks do not explicitly capture the combination of fully accessible (saturated) within-group interactions and constrained between-group connectivity characteristic of many real-world settings. Here, we introduce the Multi-Clique (MC) network model, a generative framework in which individuals are organised into fully connected cliques representing stable contact groups (e.g., households, classrooms, or workplaces), with a limited number of external connections governing inter-group transmission. Using stochastic susceptible-infectious-recovered (SIR) simulations on degree-matched networks, we compare epidemic dynamics on MC networks with those on classical random graph models. Despite having an identical mean degree, MC networks exhibit systematically distinct behaviour, including slower epidemic growth, reduced peak prevalence, increased fade-out probability, and delayed time to peak. These effects arise from rapid within but constrained between clique transmission, creating structural bottlenecks that standard models do not capture. The MC framework provides an interpretable, data-driven representation of recurrent contact structure, with parameters that map directly to observable quantities such as household and classroom sizes. By isolating the role of intergroup connectivity, the model offers a basis for evaluating targeted intervention strategies that reduce between-group mixing while preserving within-group interactions. Our results highlight the importance of explicitly representing the real-life clique-based network structure in epidemic models and suggest that classical degree-matched networks may systematically overestimate epidemic speed and intensity in structured populations.
Ben-Joseph, J.
Show abstract
Lightweight epidemic calculators are widely used for teaching and rapid scenario exploration, yet many omit the methodological detail needed for scientific reuse. We present a browser-native SIR calculator that exposes forward Euler and classical fourth-order Runge--Kutta (RK4) integration alongside epidemiologically interpretable outputs and a population-conservation diagnostic. The implementation is anchored to analytical properties of the deterministic SIR system, including the epidemic threshold, the peak condition, and the final-size relation. Benchmark experiments show that RK4 is essentially step-size invariant over practical discretizations, whereas Euler at a coarse one-day step overestimates peak prevalence by 3.97% and final size by 0.66% relative to a fine-step RK4 reference. These results demonstrate that browser-based tools can support publication-quality computational narratives when solver choice, diagnostics, and assumptions are treated as first-class outputs.
Ng, J. Y.; Tan, J.; Syed, N.; Adapa, K.; Gupta, P. K.; Li, S.; Mehta, D.; Ring, M.; Shridhar, M.; Souza, J. P.; Yoshino, T.; Lee, M. S.; Cramer, H.
Show abstract
Background: Generative artificial intelligence (GenAI) chatbots have shown utility in assisting with various research tasks. Traditional, complementary, and integrative medicine (TCIM) is a patient-centric approach that emphasizes holistic well-being. The integration of TCIM and GenAI presents numerous key opportunities. However, TCIM researchers' attitudes toward GenAI tools remain less understood. This large-scale, international cross-sectional survey aimed to elucidate the attitudes and perceptions of TCIM researchers regarding the use of GenAI chatbots in the scientific process. Methods: A search strategy in Ovid MEDLINE identified corresponding authors who were TCIM researchers. Eligible authors were invited to complete an anonymous online survey administered via SurveyMonkey. The survey included questions on socio-demographic characteristics, familiarity with GenAI chatbots, and perceived benefits and challenges of using GenAI chatbots. Results were analysed using descriptive statistics and thematic content analysis. Results: The survey received 716 responses. Most respondents reported familiarity with GenAI chatbots (58.08%) and viewed them as very important to the future of scientific research (54.37%). The most acknowledged benefits included workload reduction (74.07%) and increased efficiency in data analysis/experimentation (71.14%). The most frequently reported challenges involved bias, errors, and limitations. More than half of the respondents (57.02%) expressed a need for training to use GenAI chatbots in the scientific process, alongside an interest in receiving training (72.07%). However, 43.67% indicated that their institutions did not offer these programs. Discussion: By developing a deeper understanding of TCIM researchers' perspectives, future AI applications in this field can be more informed, and guide future policies and collaboration among researchers.
Walton, A. E.; Versalovic, E.; Merner, A. R.; Lazaro-Munoz, G.; Bush, A.; Richardson, M.
Show abstract
Patients who participate in intracranial neuroscience research make invaluable contributions to our understanding of the brain, accelerating the development of neurotechnological interventions. Engagement of patients as part of this research presents unique challenges, where study goals can be distant from immediate clinical applications and require specialized domain knowledge. Yet methods for meaningfully integrating patient communities as part of these research efforts is essential, as intracranial neuroscience guides the application of artificial intelligence for understanding and enhancing human cognition. In order to identify what patients consider meaningful research engagement we interviewed individuals who participated in a study during their Deep Brain Stimulation (DBS) surgery and attended a group event where they interacted with our research team. Analysis of semi-structured interviews identified four main themes: interest in science and the future of clinical care, contributing to science to improve lives, connecting with others, and accessibility considerations. Based on these insights, we propose strategies for transformational participation of patient communities in intracranial neuroscience research with respect to engagement objectives, communication and scope. This approach offers a foundation for sustaining relationships between scientists and communities rooted in trust and transparency, to ensure that impacts of neurotechnology on human health and cognition are aligned with patient needs as well as desired public values.
Pore, M.; Balamurugan, K.; Atkinson, A.; Breen, D.; Mallory, P.; Cardamone, A.; McKennett, L.; Newkirk, C.; Sharan, S.; Bocik, W.; Sterneck, E.
Show abstract
Circulating tumor cells (CTCs), and especially CTC-clusters, are linked to poor prognosis and may reveal mechanisms of metastasis and treatment resistance. Therefore, developing unbiased methods for the functional characterization of CTCs in liquid biopsies is an urgent need. Here, we present an evaluation of multiplex imaging mass cytometry (IMC) to analyze CTCs in mice with human xenograft tumors. In a single-step process, IMC uses metal-labeled antibodies to simultaneously detect a large number of proteins/modifications within minimally manipulated small volumes of blood from the tail vein or heart. We used breast cancer cell lines and a patient-derived xenograft (PDX) to assess antibodies for cross-species interpretation. Along with manual verification, HALO-AI-based cell segmentation was used to identify CTCs and quantify markers. Despite some limitations regarding human-specificity, this technology can be used to investigate the effect of genetic and pharmacological interventions on the properties of single and cluster CTCs in tumor-bearing mice.
Lin, R.; Halfwerk, F. R.; Donker, D. W.; Tertoolen, J.; van der Pas, V. R.; Laverman, G. D.; Wang, Y.
Show abstract
Objective: Skin sympathetic nerve activity (SKNA) has emerged as a promising non-invasive surrogate measure of sympathetic drive, but its relevant physiological characteristics remain ill-defined. This observational study aims to investigate its regulatory patterns during rest and Valsalva maneuver (VM) in healthy participants. Method: Using a two-layer strategy integrating signal analysis and physiological modelling, we analyzed data recorded from 41 subjects performing repeated VMs. The observational layer includes time-domain feature comparisons using linear mixed-effect models, and time-varying spectral coherence analysis. The mechanistic layer proposes a mathematical model to investigate whether baroreflex and respiratory modulation are sufficient to reproduce the observed HR and average SKNA (aSKNA) dynamics. Main Results: Mean integrated SKNA (iSKNA) showed more significant change than HRV for VM induced effects. We also found mean iSKNA increase during VM varies with BMI and sex. The coherence analysis indicated that iSKNA strongly synchronized with EDR under resting conditions. The proposed model successfully reproduced main characteristics of aSKNA dynamics, yielding a high median Pearson correlation coefficient of 0.80 ([Q1, Q3] = [0.60, 0.91]). In contrast, HR dynamics were only partially captured, with a median PCC of 0.37 ([Q1, Q3] = [0.16, 0.55]). These results likely suggest SKNA provides a more direct representation of sympathetic burst dynamics during VM in healthy subjects. Significance: This study provides convergent evidence that SKNA reflects known autonomic regulatory influences in healthy subjects. These findings strengthen the physiological interpretability of SKNA while clarifying its appropriate use as a practical biomarker of sympathetic function.
Undurraga Lucero, J. A.; Chesnaye, M.; Simpson, D.; Laugesen, S.
Show abstract
Objective detection of evoked potentials (EPs) is central to digital diagnostics in hearing assessment and clinical neurophysiology, yet current approaches remain time-intensive and sensitive to inter-individual noise variability. Many existing detection methods rely on population-based assumptions or computationally demanding procedures, limiting robustness and efficiency in real-world clinical settings. We present Fmpi, a digital EP detection framework enabling individualised, real-time response detection through analytical modelling of the spectral colour and temporal dynamics of background noise within each recording. Using extensive simulations and large-scale human electroencephalography datasets spanning brainstem, steady-state, and cortical EPs recorded in adults and infants, we demonstrate performance comparable or superior to state-of-the-art bootstrapped methods while operating at a fraction of the computational cost and maintaining well-controlled sensitivity with improved specificity. Importantly, Fmpi incorporates a futility detection mechanism enabling early termination of uninformative recordings, reducing testing time without compromising diagnostic reliability.
Luisto, R.; Snell, K.; Vartiainen, V.; Sanmark, E.; Äyrämö, S.
Show abstract
In this study, we investigate gender bias in a Retrieval-Augmented Generation (RAG) based AI assistant developed for Finnish wellbeing services counties. We tested the system using 36 clinically relevant queries, each rendered in three gendered variants (male, female, gender-neutral), and evaluated responses using both an LLM-as-a-judge approach and a human expert panel consisting of a physician and a sociologist specializing in ethics. We observed substantial and clinically significant differences across gendered variants, including differential treatment urgency, inappropriate symptom associations, and misidentification of clinical context. Female variants disproportionately framed responses around childcare and reproductive health regardless of clinical relevance, reflecting societal stereotypes rather than medical reasoning. Bias manifested both at the LLM generation stage and the RAG retrieval stage, in several cases causing the model to hallucinate responses entirely. Some bias patterns were persistent across repeated runs, while others appeared inconsistently, highlighting the challenge of distinguishing systematic bias from stochastic variation.
Cotto, O.; Birgy, A.; Magnan, M.; Bechet, S.; Bonacorsi, S.; Cohen, R.; Levy, C.; Nowrouzian, F. L.; Tenaillon, O.; Blanquart, F.
Show abstract
The worldwide rise in the prevalence of extended-spectrum beta-lactamase (ESBL) producing Escherichia coli is a major public health concern. In Europe, ESBL carriage frequency increased then stabilized at about 6-8 %. Past antibiotic use and travel in countries with high ESBL frequency, notably South-East Asia, have repeatedly been identified as risk factors of ESBL carriage. Yet, the relative contributions of these mechanisms to the observed maintenance of a stable low frequency of ESBL in Europe remains unknown. Here, we used comprehensive data on the risk factors for carriage of ESBL-producing E. coli in the French community, alongside detailed microbiological characterization of both resistant and overall E. coli, to develop a biologically plausible mathematical model of ESBL resistance spread in France. The model also includes several mechanisms previously showed to favor coexistence such as population structure, variability in carriage duration and within-host dynamics. The level of resistance in the community implies resistant strains transmit 14% less than sensitive (95% credible interval 0.6-38%), and are cleared at a +23% larger rate (0.9-62%). ESBL resistance is predicted to be strongly associated with factors prolonging residence in the gut. Both the rate of antibiotic treatment and transmission strongly impact the frequency of ESBL in the community. In contrast, travel has little impact on ESBL frequency. Whether reducing treatment or transmission is best to reduce resistance depends on community-specific parameters. Our study opens perspectives for the quantitative study of resistance evolution and argues for future work to improve the characterization of the duration of carriage of commensal bacterial strains.
fadikar, a.; Hotton, A.; de Lima, P. N.; Vardavas, R.; Collier, N.; Jia, K.; Rimer, S.; Khanna, A.; Schneider, J.; Ozik, J.
Show abstract
Detailed agent-based simulations are increasingly used to support policy decisions, but their computational cost and complex uncertainty structure make systematic scenario analysis challenging. We present a data-driven, uncertainty-aware decision support (DDUADS) workflow for using stochastic simulation models as decision-support tools under limited computational budgets. The approach combines several established techniques-sensitivity screening, Bayesian calibration using simulation-based inference, and multi-surrogate model integration for translational efficiency-into a coherent pipeline that enables uncertainty-aware policy analysis. Rather than producing a single baseline, the calibration stage yields a posterior distribution over plausible model parameterizations, allowing flexible, uncertainty-aware forward projections. We demonstrate the DDUADS workflow on the INFORM-HIV agent-based model of HIV transmission in Chicago to evaluate potential disruptions in antiretroviral therapy (ART) and pre-exposure prophylaxis (PrEP) use. While the specific application is HIV modeling, the challenges and techniques described here arise in other simulation studies and can be applied to decision support in other domains.
Dai, H.-J.; Mir, T. H.; Fang, L.-C.; Chen, C.-T.; Feng, H.-H.; Lai, J.-R.; Hsu, H.-C.; Nandy, P.; Panchal, O.; Liao, W.-H.; Tien, Y.-Z.; Chen, P.-Z.; Lin, Y.-R.; Jonnagaddala, J.
Show abstract
Accurate recognition and deidentification of sensitive health information (SHI) in spoken dialogues requires multimodal algorithms that can understand medical language and contextual nuance. However, the recognition and deidentification risks expose sensitive health information (SHI). Additionally, the variability and complexity of medical terminology, along with the inherent biases in medical datasets, further complicate this task. This study introduces the SREDH/AI-Cup 2025 Medical Speech Sensitive Information Recognition Challenge, which focuses on two tasks: Task-1: Speech transcription systems must accurately transcribe speech into text; and Task-2: Medical speech de-identification to detect and appropriately classify mentions of SHI. The competition attracted 246 teams; top-performing systems achieved a mixed error rate (MER) of 0.1147 and a macro F1-score of 0.7103, with average MER and macro F1-score of 0.3539 and 0.2696, respectively. Results were presented at the IW-DMRN workshop in 2025. Notably, the results reveal that LLMs were prevalent across both tasks: 97.5% of teams adopted LLMs for Task 1 and 100% for Task 2. Highlighting their growing role in healthcare. Furthermore, we finetuned six models, demonstrating strong precision ([~]0.885-0.889) with slightly lower recall ([~]0.830-0.847), resulting in F1-scores of 0.857-0.867.
Bhansali, R.; Gorenshtein, A.; Westover, B.; Goldenholz, D. M.
Show abstract
Manuscript preparation is a critical bottleneck in scientific publishing, yet existing AI writing tools require cloud transmission of sensitive content, creating data-confidentiality barriers for clinical researchers. We introduce the Paper Analysis Tool (PAT), a free, multi-agent framework that deploys 31 specialized agents powered by small language models (SLMs) to audit manuscripts across multiple quality dimensions without external data transmission. Applied to three published clinical neurological papers, PAT generated 540 evaluable suggestions. Validation by two expert reviewers (R.B., A.G.) confirmed 391 actionable, high-value revisions (90% agreement), achieving a 72.4% overall usefulness accuracy spanning methodological, statistical, and visual domains. Furthermore, deterministic re-evaluation of 126 agent-suggested rewrite pairs using Phase 0 metrics confirmed text improvement: total word count decreased by 25%, passive voice prevalence dropped sharply from 35% to 5%, average sentence length decreased by 24%, long-sentence fraction fell by 67%, and the Flesch-Kincaid grade improved by 17% . Our validation confirms that systematic, agent-driven pre-submission review drives measurable improvements, successfully converting manuscript optimization from an opaque, manual endeavor into a transparent and rigorous scientific process. Manuscript preparation is a critical bottleneck in scientific publishing, yet existing AI writing tools require cloud transmission of sensitive content, creating data-confidentiality barriers for clinical researchers. We introduce the Paper Analysis Tool (PAT), a free, multi-agent framework that deploys 31 specialized agents powered by small language models (SLMs) to audit manuscripts across multiple quality dimensions without external data transmission. Applied to three published clinical neurological papers, PAT generated 540 evaluable suggestions. Independent validation by two expert reviewers (R.B., A.G.) confirmed 391 actionable, high-value revisions (90% agreement), achieving a 72.4% overall usefulness accuracy spanning methodological, statistical, and visual domains. Furthermore, deterministic re-evaluation of 126 suggested Phase 0 rewrite pairs confirmed text improvement: total word count decreased by 25%, passive voice prevalence dropped sharply from 35% to 5%, average sentence length decreased by 24%, and long-sentence fraction fell by 67%, and the Flesch-Kincaid grade improved modestly. Our validation confirms that systematic, agent-driven pre-submission review drives measurable improvements, successfully converting manuscript optimization from an opaque, manual endeavor into a transparent and rigorous scientific process.
Challier, V.; Diebo, B.; Lafage, V.; Dehouche, N.; Lonjon, G.; Cristini, J.; SpineDAO,
Show abstract
Study Design: Prospective observational study using a novel digital ledger technology (DLT)-based crowdsourcing platform. Objective: To develop and evaluate Spine Reviews, a blockchain-based platform for aggregating spine treatment recommendations from an international specialist panel, and to validate the clinical coherence of the resulting dataset. Summary of Background Data: Predictive models for low back pain treatment are limited by small, homogeneous datasets that fail to capture inter-clinician variability. Traditional multi-center data collection is expensive, slow, and geographically constrained. DLT-based crowdsourcing with cryptographic credentialing may overcome these barriers. Methods: Five hundred synthetic patient vignettes (digital twins) were generated; 463 retained after quality control. A review platform was built on the Solana blockchain using non-transferable Soulbound Tokens (SBTs) for credentialing and smart-contract compensation. Fifty-two specialists from 7 countries provided 4+ reviews per vignette across four treatment tiers, without access to imaging or physical examination. Mixed-effects regression with reviewer random intercepts partitioned decision variability. Results: The platform collected 2,066 completed reviews (97.7%) over 37 days at USD 0.97/review. Variance decomposition revealed that 36.7% of treatment tier variability was attributable to patient presentation, 19.2% to reviewer practice style, and 44.1% to their interaction. Neurological deficits (beta=0.39), symptom duration (beta=0.12), and pain (beta=0.09) independently predicted treatment escalation (all p<0.001). Gwet's AC1 was almost perfect for emergency (0.92) and substantial for conservative decisions (0.67). Reviewer confidence in treatment recommendations decreased with escalating tier severity (conservative 4.59/5 vs surgical 4.05/5), suggesting appropriate uncertainty calibration. Conclusions: DLT with SBT credentialing enables rapid, global, cost-effective aggregation of clinically coherent expert judgment. The three-component variance structure quantifies clinical equipoise in spine care and establishes that predictive models require diverse, multi-reviewer training data. Keywords: digital ledger technology; blockchain; crowdsourcing; clinical decision-making; low back pain; Soulbound Tokens
Purkayastha, D. S.
Show abstract
Inadequate discharge communication is a well-documented contributor to medication non-adherence, missed follow-ups, and preventable readmissions across healthcare systems worldwide. In resource-limited oncology settings, where patients are often low-literate, speak non-dominant languages, and manage complex multi-drug regimens, this problem is acute and largely unaddressed. We present Aakhyan, a vernacular patient communication platform that addresses the full post-discharge arc: from converting English-language discharge summaries into structured, voice-based vernacular explanations, through medication adherence support, to proactive follow-up management - all delivered via WhatsApp. The architecture is novel in its strict separation of concerns: a vision-language model performs structured JSON extraction from discharge images; all patient-facing content is generated deterministically from clinician-approved templates with community-sensitive vocabulary registers. This design eliminates the hallucination risk inherent in generative AI patient communication (documented at 18-82% in prior studies) while preserving the extraction capability of large language models. The platform supports four language registers, Bengali, Hindi, simplified English for tribal populations, and Assamese, with text-to-speech synthesis across all registers, including a custom grapheme-to-phoneme engine developed for Assamese phonology. Beyond discharge communication, the platform includes scheduled medication adherence nudges, interactive follow-up reminders, and a Daily Availability and Patient Notification System (DAPNS) that notifies patients the evening before their follow-up whether their doctor and required investigations are available, preventing wasted trips by rural patients who travel 2-6 hours to reach the centre. A 100-patient stratified randomised controlled study is planned at Silchar Cancer Centre, with structured teach-back assessment at 48-72 hours post-discharge as the primary comprehension outcome and preliminary clinical efficacy as a secondary objective. This paper describes the clinical rationale, technical architecture, safety framework, and positioning of Aakhyan within the existing literature on mHealth patient communication interventions.
Ogaki, S.; Kaneda, M.; Nohara, T.; Fujita, S.; Osako, N.; Yagi, T.; Tomita, Y.; Ogata, T.
Show abstract
Study ObjectivesTo evaluate wearable sleep staging across sleep apnea severity, including very severe sleep apnea defined as an apnea-hypopnea index (AHI)[≥] 50 events/h, and to assess how training-set composition affects performance in this subgroup. MethodsWe analyzed 552 overnight recordings, 318 from the Sleep Lab Dataset and 234 from the Hospital Dataset. In the Hospital Dataset, 26.5% had very severe sleep apnea. We developed a deep learning model for sleep staging using RR intervals from wrist-worn photoplethysmography and three-axis accelerometry. Baseline performance was assessed by cross-validation under 5-stage and 4-stage staging. We examined night-level associations with AHI severity. We also compared the baseline model with an ablation model trained on the same number of recordings but with more Sleep Lab Dataset and lower-AHI Hospital Dataset recordings, evaluating both models in the very severe subgroup. ResultsIn 5-stage classification, Cohens kappa was 0.586 in the Sleep Lab Dataset and 0.446 in the Hospital Dataset. Under 4-stage staging, the gap narrowed, with kappa values of 0.632 and 0.525, respectively. In the Hospital Dataset, performance declined with increasing AHI severity. Among 62 recordings with very severe sleep apnea, reducing high-AHI representation in training lowered kappa from 0.365 to 0.303. ConclusionsWearable sleep staging performance declined across greater sleep apnea severity in this clinical cohort. Clinical utility may benefit from training data that better represent the target severity spectrum and from selecting staging granularity to match the intended use case. Statement of SignificanceRepeated laboratory polysomnography is impractical for long-term sleep apnea management. Wearable sleep staging could support scalable monitoring, yet its reliability in clinically severe sleep apnea has remained unclear. This study developed and evaluated a wearable sleep staging approach in both sleep-laboratory and hospital cohorts. The hospital cohort included many severe and very severe cases. Performance was lower in the hospital cohort and declined with greater sleep apnea severity. A coarser staging scheme reduced the gap between cohorts, and models trained without representative very severe cases performed worse in this target population. These findings highlight the value of severity-aware model development and motivate future multi-night home validation with reliability cues.
Fonseca, P.; Ross, M.; van Meulen, F.; Asin, J.; van Gilst, M. M.; Overeem, S.
Show abstract
ObjectiveLong term monitoring of obstructive sleep apnea (OSA) severity may be relevant for several clinical applications. We developed a method for estimating the apnea-hypopnea index (AHI) using wrist-worn, reflective photoplethysmography (PPG). ApproachA neural network was developed to detect respiratory events using PPG and PPG-derived sleep stages as input. The development database encompassed retrospective data from three polysomnographic datasets (N=3111), including a dataset with concurrent reflective PPG recordings from a wrist-worn device (N=969). The model was pre-trained with (transmissive) finger-PPG signals from all overnight recordings and then fine-tuned to wrist-PPG characteristics using transfer learning. Validation was performed on the test portion of the development set and on a fourth, external hold-out dataset containing both wrist-PPG and PSG data (N=171). Performance was evaluated in terms of AHI estimation accuracy and OSA severity classification. Main ResultsThe fine-tuned wrist-PPG model demonstrated strong agreement with the PSG-derived gold-standard AHI, achieving intra-class correlation coefficients of 0.87 in the test portion of the development set and 0.91 in the external hold-out validation set. Diagnostic performance was high, with accuracies above 80% for all severity thresholds. SignificanceThe study highlights the potential of reflective PPG-based AHI estimation, achieving high estimation performance in comparison with PSG. These measurements can be performed with relatively comfortable sensors integrated in convenient wrist-worn wearables, enabling long-term assessment of sleep disordered breathing, both in a diagnostic phase, and during therapy follow-up.
Brito-Pacheco, D. A.; Giannopoulos, P.; Reyes-Aldasoro, C. C.
Show abstract
In this work, the impact of outliers on the performance of machine learning and deep learning models is investigated, specifically for the case of histopathological images of colorectal cancer stained with Haematoxylin and Eosin. The evaluation of the impact is done through the systematic comparison of one machine learning model (Random Forests) and one deep learning model (ResNet-18). Both models were trained with the popular NCT-CRC-HE-VAL-100K dataset and tested on the CRC-HE-VAL-7K companion set. Then, a curation process was performed by analysing the divergence of patches based on chromatic, textural and topological features of the training set and removing outliers to repeat the training with a cleaned dataset. The results showed that machine learning models, can benefit more from improvements in the quality of data, than deep learning models. Further, the results suggest that deep learning models are more robust to outliers as, through the training process, the architectures can learn features other than those previously mentioned.